{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "c48511a5-f808-435e-9649-597b88fb2f55",
   "metadata": {},
   "outputs": [],
   "source": [
    "%run -i \"../util/lang_utils.ipynb\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "be55de18-7373-4f84-81d5-12cb92c6560e",
   "metadata": {},
   "outputs": [],
   "source": [
    "article = \"\"\"iPhone 12: Apple makes jump to 5G\n",
    "Apple has confirmed its iPhone 12 handsets will be its first to work on faster 5G networks. \n",
    "The company has also extended the range to include a new \"Mini\" model that has a smaller 5.4in screen. \n",
    "The US firm bucked a wider industry downturn by increasing its handset sales over the past year. \n",
    "But some experts say the new features give Apple its best opportunity for growth since 2014, when it revamped its line-up with the iPhone 6. \n",
    "\"5G will bring a new level of performance for downloads and uploads, higher quality video streaming, more responsive gaming, \n",
    "real-time interactivity and so much more,\" said chief executive Tim Cook. \n",
    "There has also been a cosmetic refresh this time round, with the sides of the devices getting sharper, flatter edges. \n",
    "The higher-end iPhone 12 Pro models also get bigger screens than before and a new sensor to help with low-light photography. \n",
    "However, for the first time none of the devices will be bundled with headphones or a charger. \n",
    "Apple said the move was to help reduce its impact on the environment. \"Tim Cook [has] the stage set for a super-cycle 5G product release,\" \n",
    "commented Dan Ives, an analyst at Wedbush Securities. \n",
    "He added that about 40% of the 950 million iPhones in use had not been upgraded in at least three-and-a-half years, presenting a \"once-in-a-decade\" opportunity. \n",
    "In theory, the Mini could dent Apple's earnings by encouraging the public to buy a product on which it makes a smaller profit than the other phones. \n",
    "But one expert thought that unlikely. \n",
    "\"Apple successfully launched the iPhone SE in April by introducing it at a lower price point without cannibalising sales of the iPhone 11 series,\" noted Marta Pinto from IDC. \n",
    "\"There are customers out there who want a smaller, cheaper phone, so this is a proven formula that takes into account market trends.\" \n",
    "The iPhone is already the bestselling smartphone brand in the UK and the second-most popular in the world in terms of market share. \n",
    "If forecasts of pent up demand are correct, it could prompt a battle between network operators, as customers become more likely to switch. \n",
    "\"Networks are going to have to offer eye-wateringly attractive deals, and the way they're going to do that is on great tariffs and attractive trade-in deals,\" \n",
    "predicted Ben Wood from the consultancy CCS Insight. Apple typically unveils its new iPhones in September, but opted for a later date this year. \n",
    "It has not said why, but it was widely speculated to be related to disruption caused by the coronavirus pandemic. The firm's shares ended the day 2.7% lower. \n",
    "This has been linked to reports that several Chinese internet platforms opted not to carry the livestream, \n",
    "although it was still widely viewed and commented on via the social media network Sina Weibo.\"\"\"\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "b0567e9c-33c5-48f1-ab9d-9921fbd330b1",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "44\n",
      "12 7 9 CARDINAL\n",
      "Apple 11 16 ORG\n",
      "5 31 32 CARDINAL\n",
      "Apple 34 39 ORG\n",
      "12 65 67 CARDINAL\n",
      "first 89 94 ORDINAL\n",
      "5 113 114 CARDINAL\n",
      "5.4 216 219 CARDINAL\n",
      "US 235 237 GPE\n",
      "the past year 313 326 DATE\n",
      "Apple 372 377 ORG\n",
      "2014 416 420 DATE\n",
      "5 472 473 CARDINAL\n",
      "Tim Cook 661 669 PERSON\n",
      "12 813 815 CARDINAL\n",
      "first 934 939 ORDINAL\n",
      "Apple 1012 1017 ORG\n",
      "Tim Cook 1083 1091 PERSON\n",
      "5 1130 1131 CARDINAL\n",
      "Dan Ives 1162 1170 PERSON\n",
      "Wedbush Securities 1186 1204 ORG\n",
      "about 40% 1221 1230 PERCENT\n",
      "950 million 1238 1249 CARDINAL\n",
      "iPhones 1250 1257 ORG\n",
      "at least three 1290 1304 CARDINAL\n",
      "Mini 1384 1388 PERSON\n",
      "Apple 1400 1405 ORG\n",
      "one 1523 1526 CARDINAL\n",
      "Apple 1559 1564 ORG\n",
      "April 1604 1609 DATE\n",
      "iPhone 11 1686 1695 LAW\n",
      "Marta Pinto 1711 1722 PERSON\n",
      "iPhone 1873 1879 ORG\n",
      "UK 1931 1933 GPE\n",
      "second 1942 1948 ORDINAL\n",
      "Ben Wood 2312 2320 PERSON\n",
      "CCS Insight 2342 2353 ORG\n",
      "Apple 2355 2360 ORG\n",
      "iPhones 2387 2394 ORG\n",
      "September 2398 2407 DATE\n",
      "a later date this year 2423 2445 DATE\n",
      "2.7% 2594 2598 PERCENT\n",
      "Chinese 2652 2659 NORP\n",
      "Sina Weibo 2797 2807 PERSON\n"
     ]
    }
   ],
   "source": [
    "doc = small_model(article)\n",
    "print(len(doc.ents))\n",
    "small_model_ents = doc.ents\n",
    "for ent in doc.ents:\n",
    "    print(ent.text, ent.start_char, ent.end_char, ent.label_)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "e7580e76-447e-4fdc-b0df-18061d5c372b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "46\n",
      "12 7 9 CARDINAL\n",
      "Apple 11 16 ORG\n",
      "5 31 32 CARDINAL\n",
      "G\n",
      "Apple 32 39 ORG\n",
      "12 65 67 CARDINAL\n",
      "first 89 94 ORDINAL\n",
      "5 113 114 CARDINAL\n",
      "5.4 216 219 CARDINAL\n",
      "US 235 237 GPE\n",
      "the past year 313 326 DATE\n",
      "Apple 372 377 ORG\n",
      "2014 416 420 DATE\n",
      "6 467 468 CARDINAL\n",
      "5 472 473 CARDINAL\n",
      "Tim Cook 661 669 PERSON\n",
      "12 813 815 CARDINAL\n",
      "Pro 816 819 PRODUCT\n",
      "first 934 939 ORDINAL\n",
      "Apple 1012 1017 ORG\n",
      "Tim Cook 1083 1091 PERSON\n",
      "5 1130 1131 CARDINAL\n",
      "Dan Ives 1162 1170 PERSON\n",
      "Wedbush Securities 1186 1204 ORG\n",
      "about 40% 1221 1230 PERCENT\n",
      "950 million 1238 1249 CARDINAL\n",
      "at least three 1290 1304 CARDINAL\n",
      "Mini 1384 1388 ORG\n",
      "Apple 1400 1405 ORG\n",
      "one 1523 1526 CARDINAL\n",
      "Apple 1559 1564 ORG\n",
      "SE 1598 1600 PRODUCT\n",
      "April 1604 1609 DATE\n",
      "11 1693 1695 CARDINAL\n",
      "Marta Pinto 1711 1722 PERSON\n",
      "IDC 1728 1731 ORG\n",
      "UK 1931 1933 GPE\n",
      "second 1942 1948 ORDINAL\n",
      "Ben Wood 2312 2320 PERSON\n",
      "CCS Insight 2342 2353 ORG\n",
      "Apple 2355 2360 ORG\n",
      "September 2398 2407 DATE\n",
      "a later date this year 2423 2445 DATE\n",
      "the day 2586 2593 DATE\n",
      "2.7% 2594 2598 PERCENT\n",
      "Chinese 2652 2659 NORP\n",
      "Sina Weibo 2797 2807 PERSON\n"
     ]
    }
   ],
   "source": [
    "doc = large_model(article)\n",
    "print(len(doc.ents))\n",
    "large_model_ents = doc.ents\n",
    "for ent in doc.ents:\n",
    "    print(ent.text, ent.start_char, ent.end_char, ent.label_)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "4f71a8e0-84b8-4723-9beb-f22dbea10931",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'iPhone 11', 'iPhone', 'iPhones'}\n",
      "{'6', 'the day', 'IDC', '11', 'Pro', 'G\\nApple', 'SE'}\n"
     ]
    }
   ],
   "source": [
    "small_model_ents = [str(ent) for ent in small_model_ents]\n",
    "large_model_ents = [str(ent) for ent in large_model_ents]\n",
    "in_small_not_in_large = set(small_model_ents) - set(large_model_ents)\n",
    "in_large_not_in_small = set(large_model_ents) - set(small_model_ents)\n",
    "print(in_small_not_in_large)\n",
    "print(in_large_not_in_small)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "84bd9a1e-043a-4014-bd0d-f659a70f7d7b",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
